Plaque Begets Plaque, ApoB Does Not

The Statistics Cause Doubt


John Slough

Study Summary

Plaque Begets Plaque, ApoB Does Not — Soto-Mota et al., 2025

  • Trial ID: NCT05733325

  • Design: 1-year prospective cohort using coronary CT angiography (CCTA)

  • Participants: 100 lean, metabolically healthy adults on keto ≥2 yrs

    • LDL-C ≥190 mg/dL, HDL-C ≥60 mg/dL, TG ≤80 mg/dL, ApoB 185 ± 51 mg/dL

  • Findings: No link between ApoB/LDL-C (baseline/change) and plaque progression

  • Predictors: Baseline plaque (CAC, NCPV, TPS, PAV) strongly predicted progression (ΔNCPV)

  • Stats: Bayesian analysis favored no ApoB–plaque link (6–10× vs alt)

  • Conclusion: Plaque begets plaque; ApoB does not

  1. ΔNCPV as Outcome Variable
  2. Univariable Linear Models
  3. Bayesian Inference
  4. Bayesian Prior Choice
  5. Bayes Factor Interpretation

ΔNCPV as Outcome Variable

Plaque begets plaque: biology or mathematical artifact?

1 ΔNCPV as Outcome Variable

“All baseline plaque metrics (coronary artery calcium, NCPV, total plaque score, and percent atheroma volume) were strongly associated with the change in NCPV.”

Change in noncalcified plaque volume \(\Delta \text{NCPV}\) was the outcome:

\[ \Delta \text{NCPV} = \text{NCPV}_{1} - \text{NCPV}_0 \]

They regressed \(\Delta \text{NCPV}\) directly on its baseline value \(\text{NCPV}_0\):

\[ \Delta \text{NCPV} = \alpha + \beta \, \text{NCPV}_0 + \varepsilon \]

But this introduces mathematical coupling, because \(\text{NCPV}_0\) appears on both sides of the equation:

\[ \text{NCPV}_{1} - \text{NCPV}_0 = \alpha + \beta \, \text{NCPV}_0 + \varepsilon \]

1 ΔNCPV as Outcome Variable

“Mathematical coupling occurs when one variable directly or indirectly contains the whole or part of another, and the two variables are then analysed using correlation or regression. As a result, the statistical procedure of testing the null hypothesis — that the coefficient of correlation or the slope of regression is zero — might no longer be appropriate”


Regression model: \(\text{NCPV}_{1} - \text{NCPV}_0 = \alpha + \beta \, \text{NCPV}_0 + \varepsilon\)


The regression coefficient (slope): \(\beta = \frac{\operatorname{Cov}((\text{NCPV}_1 - \text{NCPV}_0),\ \text{NCPV}_0)}{\operatorname{Var}(\text{NCPV}_0)} = \frac{\operatorname{Cov}(\Delta \text{NCPV},\ \text{NCPV}_0)}{\operatorname{Var}(\text{NCPV}_0)}\)


This simplifies to: \(\beta = \frac{\rho\, \sigma_1 - \sigma_0}{\sigma_0}\)

where:

  • \(\rho = \operatorname{Cor}(\text{NCPV}_0,\ \text{NCPV}_1)\)
  • \(\sigma_0\), \(\sigma_1\) = SDs at baseline and follow-up

With baseline NCPV contributing to both predictor and outcome, the slope captures an inseparable combination of mathematical coupling and possible biological change.
It is not a clean estimate of baseline influence.

1 ΔNCPV as Outcome Variable

From Oldham* (1962): \(\beta > 0 \quad\text{if}\quad \rho > \frac{\sigma_0}{\sigma_1}\)

The slope depends on:

  • correlation between baseline and follow-up (\(\rho\))
  • relative spread of the two time points

So a positive or negative slope arises purely from the math.

In this study:

  • Almost all participants had increased NCPV
  • Implies \(\rho\) was moderate-to-high, and \(\sigma_1 > \sigma_0\)

These conditions tend to bias the slope upward; whether the resulting positive association reflects biology, coupling, or both cannot be determined from this model.

*Source: Oldham, 1962, J. Chronic Dis.

1 ΔNCPV as Outcome Variable

n <- 100 # Set the number of values to generate
baseline <- rnorm(n, mean = 100, sd = 10) # Create 100 random numbers centered around 100
follow_up <- rnorm(n, mean = 120, sd = 10) # Create another 100 random numbers, centered around 110  
delta <- follow_up - baseline # Subtract the first set from the second to get the difference


No true association

Association due to mathematical coupling

1 ΔNCPV as Outcome Variable

An alternative is to model NCPV at follow-up (\(\text{NCPV}_1\)) directly while adjusting for baseline NCPV. This example uses ApoB as the independent variable:

\[ \text{NCPV}_1 = \alpha + \gamma\,\text{NCPV}_0 + \beta ApoB + \varepsilon \]

This approach avoids mathematical coupling, reduces residual variance, and allows the coefficient on \(ApoB\) to reflect biological association — not algebraic structure.


You could test whether baseline NCPV predicts follow-up using a mixed-effects model with a Time × Baseline interaction:

\[ \text{NCPV}_{ij} = \alpha + \gamma\,\text{Time}_{ij} + \beta\,\text{NCPV}_{0j} + \delta\,(\text{Time}_{ij} \cdot \text{NCPV}_{0j}) + b_j + \varepsilon_{ij} \]

but the with the number of subjects this must be done carefully.

1 ΔNCPV as Outcome Variable

You cannot determine whether a positive or negative slope from ΔNCPV ∼ NCPV₀ reflects biology or math, because the math builds the relationship. To isolate biological effects, you must model follow-up directly with baseline as a covariate — not as part of the outcome.


“Statisticians have repeatedly warned against correlating/regressing change with baseline due to two methodological concerns known as mathematical coupling and regression to the mean.”

“Mathematical coupling can lead to an artificially inflated association between initial value and change score when correlation or regression is used.”



Plaque begets plaque: biology or a mathematical artifact?



See also:
Analysis of ‘change scores’
Assessing the Relationship between the Baseline Value of a Continuous Variable and Subsequent Change Over Time
Mathematic coupling of data: a common source of error Revisiting the relation between change and initial value: a review and evaluation

Univariable Linear Models

At best: exploratory.

At worst: misleading.

2 Univariable Linear Models

“Linear models on the primary (NCPV) and secondary outcomes were univariable

Despite having multiple predictors available (age, sex, ApoB, BMI, Triglycerides, Systolic blood pressure, CAC, NCPV₀, LDL-C exposure),
each was tested separately in single-predictor regressions.

This modeling choice introduces omitted-variable bias:

“…omitting a relevant variable from a model which explains the independent and dependent variable leads to biased estimates.”
Wilms (2021)

When a predictor is correlated with both the outcome and another omitted variable, its coefficient may absorb the effect of that omitted factor.

2 Univariable Linear Models

Example: ΔNCPV, ApoB and Age

They modeled:

\[ \Delta \text{NCPV} = \alpha + \beta \text{ApoB} + \varepsilon \]

But if age also predicts ΔNCPV and correlates with ApoB, then \(\beta\) is biased — it partly reflects the effect of age.

A more appropriate model would be:

\[ \Delta \text{NCPV} = \alpha + \beta_1 \text{ApoB} + \beta_2 \text{Age} + \varepsilon \]

This separates the contribution of ApoB from that of age.


Univariable Linear Models are make confounding almost certain, especially in small, non-randomized human data, undermining any claim of association or non-association.

Univariable Linear Models are exploratory and can be misleading.

2 Univariable Linear Models

But wait:

“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate. Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model.”

So they did use multivariable models! — but only on the follow-up NCPV, not for ΔNCPV, the paper’s main endpoint?

This selective use raises questions:

  • Why not model ΔNCPV with adjustment for known confounders?
  • Why apply different modeling rules depending on the predictor?
  • Why include some adjusted models in the paper?

Were multivariable models used selectively for some reason? And why not on the study’s main endpoint?

Bayesian Modeling

Unusual. Fragile. Overstated.

3 Bayesian Inference

Frequentist:

Assuming there is no true association between ApoB and ΔNCPV, how likely is it that we’d observe a slope as large (or larger) than the one we found, just by chance?

If p-value (p > 0.05) a frequentist analysis can say:

“We did not find sufficient evidence to reject the hypothesis that ApoB has no association with ΔNCPV”

It cannot say the null is likely true, or produce the probability that there is no association, just that the data were inconclusive.


Bayesian:

How well do the data fit under two competing models, one with no association (null), and one with a range of plausible effect sizes for ApoB (alternative)?

A Bayes factor (e.g., BF₁₀ = 6) allows a stronger statement:

“The observed data are 6 times more likely (moderate evidence) under the ‘no association’ model than under the alternative model that assumes some effect from ApoB (as defined by the prior).”

  • Frequentist: “No evidence of effect”
  • Bayesian: “Evidence for no effect”

3 Bayesian Inference

“Since lack of statistical signifcance (ie, P > 0.05) should not be interpreted as evidence in favor of the null but simply a failure to reject the null, the addition of Bayesian inference adds credence to finding that there is no association between NCPV vs LDL-C or ApoB…”

So, they turn to Bayesian inference to “support” their finding that ApoB has no association with plaque progression.


This is unusual in a non-randomized, uncontrolled, 1-year observational study on a highly restricted sample:

  • Study design not suited for strong inferences about presence or absence of associations
  • Univariable, unadjusted models: reduce credibility of any statistical conclusion
  • Bayesian inference used to imply absence of effect, not just lack of evidence


Despite the limited model and context, they present the result as confirmatory.

3 Bayesian Inference


They are applying a stronger-sounding statistical framework onto a structurally weak analysis.

This is a misuse of Bayesian inference.


Not because Bayesian methods are invalid.


Because they’re being used to amplify certainty in an analysis that lacks adjustment, control, or transparency about its assumptions.

4 Bayesian Prior Choice

“Bayes factors were calculated using BayesFactor::regressionBF… and an ~ rscale value of 0.8 to contrast a moderately informative prior with a conservative distribution width (to allow for potential large effect sizes) due to the well-documented association between ApoB changes and coronary plaque changes”


Bayesian Prior: represents your belief about likely effect sizes before seeing the data.

From the BayesFactor documentation for the parameter rscaleCont:

“Several named values are recognized: ‘medium’, ‘wide’, and ‘ultrawide’, which correspond to rscales of √2/4, 1/2, and √2/2, respectively.”

  • “medium” → rscale = √2 / 4 ≈ 0.354
  • “wide” → rscale = 0.5
  • “ultrawide” → rscale = √2 / 2 ≈ 0.707

rscale of 0.8 is wider than “ultrawide”. It is not a “moderately informative” prior. It’s actually a weakly informative or vague prior, placing most of its weight on large effects.

A moderately informative prior would typically correspond to “medium” (≈ 0.354) or “wide” (0.5), which place more mass on smaller effects.

4 Bayesian Prior Choice

The authors’ prior choice isn’t wrong, but their description of it is misleading.

Labeling an r = 0.8 prior as “moderately informative” or “conservative” downplays the fact that it assumes large effects, making small observed effects look unlikely under H₁ and inflating support for H₀.


Their choice of prior is subjective, influential, and not tested for robustness.

  • The model was set up to expect large ApoB effects, so small observed effects are treated as evidence for no effect.

  • Best practice is to run a sensitivity analysis, to see whether conclusions change with different priors.

“it is always important to conduct a prior sensitivity analysis to fully understand the influence that the prior settings have on posterior estimates”

“a researcher can have a very strong opinion about the model parameter values, and this opinion (via the prior) can drive the final model estimates.”

4 Bayesian Prior Choice

Prior Scale Sensitivity Analysis on ΔNPCV ~ ApoB Model


Bayes factor sensitivity analysis
rscale BF₁₀ BF₀₁
0.100 0.530 1.889
0.250 0.288 3.473
0.350 0.217 4.608
0.500 0.157 6.363
0.707 0.113 8.834
0.800 0.100 9.954
1.000 0.081 12.374

This kind of rscale sensitivity analysis is standard for default Bayes factors, but it’s a limited diagnostic — it tests only prior width, not prior plausibility or model fit.

5 Bayes Factor Interpretation

“In other words, these data suggest it is 6 to 10 times more likely that the hypothesis of no association between these variables (the null) is true as compared to the alternative.”

That is a overstatement of what the Bayes Factor tells us.

A Bayes factor of 6–10 means the data are 6–10× more likely under the null model than under the alternative model, not that the null hypothesis is 6–10× more likely to be true.

They could have said: “The data are 6–10 times more likely under the no-association model than under the alternative”

\[ \text{Posterior Odds} = \text{Bayes Factor} \times \text{Prior Odds} \]

To claim that the null is 6–10× more likely to be true, they would have to assume prior odds = 1:1 — and state that explicitly. They didn’t.

Source: Bayes Factors – Kass Raferty, 1995

5 Bayes Factor Interpretation

Even without other issues (e.g. confounding / non-adjusted variables / short follow-up / non-RCT, etc.), the reported BF of 6.3 for ΔNCPV ~ ApoB reflects only moderate evidence for no effect — not strong or decisive.

Table 1. A heuristic classification scheme for Bayes factors BF10 Source: SpringerLink

(Assuming BF₁₀ as per standard conventions; if BF₀₁, it reverses)

Summary

  1. ΔNCPV as outcome
    Regressed (NCPV₁ − NCPV₀) on baseline NCPV₀
    Mathematical coupling → regression slopes reflect algebra, not just biology

  2. Univariable regressions
    Each predictor tested separately (ApoB, LDL-C, age…)
    No confounder adjustment → biased, unreliable, low credibility estimates

  3. Bayesian inference
    Bayes inference used to support “no ApoB effect” & “plaque begets plaque”
    Unadjusted, observational data → misleading, unusual use of Bayesian inference

  4. Prior choice (rscale = 0.8)
    Prior assumes large effects
    No sensitivity analysis → results likely prior-driven

  5. Bayes factor interpretation
    Claimed null is “6–10× more likely”
    Bayes factor misstated as posterior probability → compares model fit, not truth

  6. Headline claim
    “Plaque Begets Plaque, ApoB Does Not”
    Overstates evidence → mathematically coupled, confounded, and fragile analysis

Additional Concerns

No Adjustment for Multiple Comparisons

Predictors tested against two outcomes (Δ-NCPV, Δ-TPS) → ≥10 comparisons

  • Bonferroni: α = 0.05 → α′ ≈ 0.005
    Baseline CAC (P < 0.001) would survive; others might not
  • Benjamini–Hochberg FDR maintains power with correlated tests

Numerous regressions were run — increasing false positive risk
But no multiple testing correction was applied.

Perhaps the authors viewed this as exploratory, where correction is often skipped —
but then why title the paper “Plaque Begets Plaque, ApoB Does Not”?

Zero-inflation, censoring, heteroscedasticity

Baseline NCPV median = 44 mm³; TPS median = 0 → ≥50% of values are zero
CCTA cannot report negative plaque → both outcomes are left-censored at 0

This affects not just modeling but measurement:
When true plaque ≈ 0, error is asymmetric — it can only overestimate.

ΔNCPV, their primary outcome, is a change score between two bounded, skewed measures.
Likely to produce non-normal residuals and heteroscedasticity (e.g., larger spread at higher baseline).

If smaller baseline values were also linked to larger increases,
this may reflect the effects of left-censoring and error asymmetry — not true biological acceleration.

These issues are clear in TPS, and may affect NCPV, but diagnostics are not shown.

OLS assumes homoscedastic, normal residuals
performance::check_model() was run — but no output provided

They could have considered methods to address this such as: Tobit regression, log-transform, or robust SEs

Possibly Underpowered Study

  • Study Registered Primary endpoint: %Δ NCPV over 12 mo, not specifically sized for ApoB detection.
  • With n = 100, 80 % power is reached for large and medium effects for 1 predictor linear regression.
  • A null result after one year may be due to low power, not proof of no ApoB effect.
  • Note: even if it is adequately powered, it doesn’t negate all the other issues like mathematical coupling, confounding, non-adjustment, short follow-up etc.

On the Heterogeneity of the LMHR Group

“It should be emphasized that this includes heterogeneity in progression (and regression) across the population.” - Keta-CTA study

“If, despite our results show that CVDrisk among LMHRs is heterogeneous (and thus, a pooled summary isn’t a good idea), you must have a numerical pooled NCPVchange value, it is: p50=18.8 mm3 IQR(37.3).” - X post from Author


They say the group is heterogeneous (to downplay the pooled NCPV change), yet they ran univariable regressions and interpreted pooled Bayes factors as if the group were homogeneous.

If a group is too heterogeneous to report pooled outcomes, it is also too heterogeneous to justify pooled inferences about predictors or mechanisms.

If their CVD risk (e.g., plaque progression) is not coherent, then the category fails as a predictive or explanatory tool.

On the Heterogeneity of the LMHR Group

“p50=18.8 mm3 IQR(37.3).”

With a median of 18.8 mm³ and only 1–2 individuals showing regression, the IQR of 37.3 mm³ must be mostly skewed upward, not balanced.

A wide IQR here reflects high inter-individual variability in outcomes among the LMHRs, variability in progression of NCPV, as all but a few individuals had more plaque at follow up.

In other words: With nearly all LMHRs showing plaque progression, a wide IQR (37.3 mm³) doesn’t indicate balanced variability—it reflects differing degrees of worsening.

This matters because:

  • The outcome (ΔNCPV) shows high variability across individuals in the LMHR group.
  • That inflates standard errors, weakens statistical power, and makes small observed effects more ambiguous.
  • It calls for adjusted modeling to reduce noise and account for confounding, not pooled univariable regressions and Bayes factors interpreted as strong evidence.

A Note on Letters to the Editor

Letter to the Editor

Response to the Letter

From the response:

“Regarding the analytical points brought forward, we are aware of the relevance of linear assumptions to obtain accurate estimators. Since residual plot evaluation can also be subjective”

Objective, quantitative statistical tests also exist for assessing model assumptions:

  • Normality: Shapiro-Wilk, Kolmogorov–Smirnov, Q-Q plots (visual / quantitative)
  • Homoscedasticity: Breusch–Pagan, White test
  • Influence / leverage: Cook’s distance, leverage plots (visual based on quantitative values)

The performance::check_model() function they cite automatically generates these checks.

It’s unclear why they wouldn’t include or reference the output.

“…we followed their suggestion and re-ran all models with robust linear regression…as expected, there were small differences with the published estimates, all models using robust regression were consistent with what was reported.”

No output, diagnostics, or model fit provided. We are asked to trust their assertion.

A Note on Letters to the Editor

“We agree that being able to identify patients with rapid plaque progression, and gaining better understanding of the mechanisms that mediate its pace (i.e. insulin resistance, inflammation, different dietary composition elements, etc.) is paramount. We plan to address these risk factors in future reports”

Are they admitting the current analysis lacks adjustment for likely confounders, despite drawing strong conclusions from univariable regressions?

Just run the multivariable models.

“Moreover, our results are compatible with a causal role of ApoB in atherosclerosis, as we have openly acknowledged and supported in previous publications.”

“Plaque Begets Plaque, ApoB does not”

“Along the same lines, we would like to clarify that our title was not meant to be a statement about causality. “Plaque begets plaque” (which, of course, mirrors the proverb “Money begets money”) is frequently used to highlight the strong and clinically relevant association of baseline plaque values with plaque progression rate[7]. In retrospect, we might have chosen “Longitudinal Data from the KETO-CTA Study” as alternative phrasing to avoid misinterpretations.”

A Note on Letters to the Editor

“misinterpretations”